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Abstract —An important logistics application of robotics in¬ 
volves manipulators that pick-and-place objects placed in ware¬ 
house shelves. A critical aspect of this task corresponds to 
detecting the pose of a known object in the shelf using visual 
data. Solving this problem can be assisted by the use of an 
RGBD sensor, which also provides depth information beyond 
visual data. Nevertheless, it remains a challenging problem since 
multiple issues need to be addressed, such as low illumination 
inside shelves, clutter, texture-less and reflective objects as well 
as the limitations of depth sensors. This paper provides a new 
rich dataset for advancing the state-of-the-art in RGBD-based 3D 
object pose estimation, which is focused on the challenges that 
arise when solving warehouse pick-and-place tasks. The publicly 
available dataset includes thousands of images and corresponding 
ground truth data for the objects used during the first Amazon 
Picking Challenge at different poses and clutter conditions. Each 
image is accompanied with ground truth information to assist 
in the evaluation of algorithms for object detection. To show 
the utility of the dataset, a recent algorithm for RGBD-based 
pose estimation is evaluated in this paper. Given the measured 
performance of the algorithm on the dataset, this paper shows 
how it is possible to devise modifications and improvements to 
increase the accuracy of pose estimation algorithms. This process 
can be easily applied to a variety of different methodologies for 
object pose detection and improve performance in the domain of 
warehouse pick-and-place. 

Index Terms —Object detection. Object recognition. Robot 
vision systems. Manufacturing automation. Manipulators 


1. Introduction 

T here is significant interest in warehouse automation, 
which frequently involves pick-and-place tasks for prod¬ 
ucts located in shelving units. This interest is exemplified 
by competitions such as the first Amazon Picking Challenge 
(APC) □, which brought together multiple academic and 
industrial teams from around the world, as well as similar 
competitions, like the Robocup@Home Challenge O. One 
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Fig. I. An example frame from the Rutgers dataset, where a pose estimate 
generated by the test algorithm is superimposed. 

way to approach the APC involved perception, motion plan¬ 
ning and grasping of 25 different objects, which were placed 
in a semi-structured way inside the bins of an Amazon-Kiva 
Pod. Solving such problems reliably can significantly alter 
the logistics of distributing products. Erequently, manipulation 
research on pick-and-place tasks has focused on fiat surfaces, 
such as tabletops. These are relatively simpler problems, 
which do not involve many of the issues that often arise in 
warehouse automation, where the presence of tight spaces, 
such as shelves, plays a critical role. 

Accurate pose estimation is crucial for successfully picking 
an object inside a shelf. In fiexible warehouses, this pose will 
not be a priori known but must be detected from sensors, 
especially visual ones. The increasing availability of RGBD 
sensors, which can simultaneously sense color and depth, 
brings the hope that such problems can be eventually solved 
reliably. But warehouse shelves have narrow, dark and obscur¬ 
ing bins that complicate object detection. Clutter can further 
challenge detection through the presence of multiple objects. 
A variety of object types need to be dealt with, some of which 
may be texture-less and not easily identifiable from color, 
and others reflective and virtually undetectable by a depth 
sensor. Eurthermore, some popular depth sensors exhibit limits 
in terms of the smallest and largest sensing radius that make it 
harder for a manipulator to utilize them. Thus, RGBD-based 
object detection and pose estimation is an active research area 
and a critical capability for warehouse automation. 

This paper provides tools that help in improving the per- 
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formance of object detection solutions for such challenges. In 
particular, it describes a new rich dataset and software for 
utilizing it. The motivation is to better equip the research 
community in evaluating and improving robotic perception 
solutions for warehouse picking. The dataset contains over 
10,000 depth and RGB registered images, complete with hand- 
annotated 6DOF poses for 24 of the APC objects (for details, 
see Section 3). Also provided are 3D mesh models of the 
APC objects, which may be used for training of recognition 
algorithms. The code for utilizing and integrating the dataset 
with different algorithms is also publicly available. 

The dataset includes images of warehouse objects in a shelf 
environment. The objects are placed in different poses in 
various bins of warehouse shelves, so as to allow a variety 
of experimental conditions (Figure [^. Multiple camera per¬ 
spectives and frames account for rich information, as well as 
spatial and temporal variation in data. The effect of clutter is 
evaluated by controlling the presence of additional objects in 
a scene. 

The dataset is compared against the one available as part of 
the LINEMOD framework for object detection m , to highlight 
the need for additional varying conditions, such as clutter, 
camera perspective and noise, which affect pose detection. 
This is the chief contribution of the dataset, the utility of 
which is further evaluated by using the open-source imple¬ 
mentation of the LINEMOD framework O) easily accessible 
via OpenCV IH. This paper does not argue that this algorithm 
is the best solution for pose estimation in shelves. The method 
is used as an example of a modern, accessible algorithm for 
object detection, which at least performs effectively in tabletop 
setups. 

The dataset reveals that the considered algorithm faces 
significant difficulties when used in a warehouse scenario. 
This allows to appreciate the features of warehouse picking, 
which complicate pose estimation. With the aid of the dataset, 
it was also possible to identify algorithmic and engineering 
adaptations to increase performance in warehouse pick-and- 
place. 

Overall, the proposed dataset emphasizes the need for the 
development of pose detection algorithms that can operate 
robustly for a wide variety of objects and conditions, especially 
in narrow, dark and cluttered spaces. Such algorithms need to 
optimally utilize all available sources of sensing data and prior 
information. 

II. Related Work 

Datasets for the task of object recognition have rapidly 
grown in recent years both in terms of number as well as 
size and scale. The applications of such datasets include in¬ 
dustrial warehouse applications like the APC m, and domestic 
applications like in the RoboCup@home O. Some standard 
RGB benchmarks for the task include CIFAR-10/100 O, 
ImageNet (Tl, and PASCAL VOC El. Some use bounding 
boxes as ground truth and others use image segmentation 
with inliers/outliers for accuracy metrics. While useful for 
2D image object recognition, RGB datasets are not ideal in 
manipulation applications, which rely not only on segmenting 
the object of interest but also on accurate pose estimation. 


Up until the last decade, the problem of 3D recognition was 
often addressed using a stereo camera. More recently, RGBD 
cameras’ availability and widespread use have increased 
interest in solutions to common 2. 5E[^ problems, such as face, 
object, and gesture recognition. Such technology has allowed 
researchers to begin to build “modem-scale” datasets, which 
help evaluating performance and identifying challenges. 
Several such datasets are described belo\\0 

Segmented Scenes Datasets 

B3DO El: A project by UC Berkeley, the dataset contains 
>3,000 2.5D crowd-sourced images. The images primarily 
focus on indoor scenes, where ground truth bounding boxes 
have been annotated for more than 50 object categories. The 
dataset has also been augmented to include (x, y, z) Cartesian 
coordinates for many object centroids. 

NYU Depth Dataset v2 flOj: The NYU dataset also 
focuses on indoor scenes, but ground truth labels are presented 
as full image segmentation. The dataset includes around 500k 
2.5D images, with approximately 1,500 fully labeled ground 
truth images. 

Manipulation Datasets 

YCB Objects & Models HD: A collaboration between 
several robotics labs, the YCB dataset provides object models 
in a variety of formats for common household objects. The 
focus of this project is to create common metrics for the 
growing interest in robotic manipulation research by providing 
reliable benchmarks for several common manipulation tasks. 

Object Datasets 

Table-Top Object Dataset |[l2l : A collaboration between 
Willow Garage and the Univ. of Michigan, this dataset consists 
of ^1,000 2.5D images with ground truth labels for 480 
frames. The objects presented belong to 3 different classes, 
each class consisting of approximately 10 different instances. 
Objects are shown on table tops in clutter of between 2-6 items 
per image. The images were collected using a structured light 
stereo camera. 

Solutions in Perception Dataset |[l3l : This dataset by 
Willow Garage contains 35 objects in ^1,000 3D training im¬ 
ages and 120 test images. In training, objects were presented in 
clutter with 6DOF ground truth for each item. The scenes were 
captured using RGBD cameras with objects on a turn-table to 
capture and reconstruct the scenes from multiple viewpoints. 
All images were captured using a consistent azimuth angle 
between the camera and the turntable. 

UW Dataset Hll: This large dataset consists of over 50 
object categories and 300 distinct instances. It features objects 
from multiple viewpoints, and is presented with ground truth 
pose for one axis [0,27r]. 

LINEMOD Dataset lUSl : As part of the body of work 
detailing the LINEMOD framework, the authors released a 
dataset of 18 object models and over 15,000 6D ground truth 

^2.5D refers here to the projection of a 2D image to 3D space, which results 
in a sparse 3D image. 

more complete list of available RGBD datasets can be found at: http: 
//wwwO.cs.ucI.ac.uk/staff/M.Firman/RGBDdatasets/ 
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Fig. 2. (Left) Items used in the Amazon Picking Challenge 2015 and featured in the dataset. Three groups of objects are identihed based on their effects 
on pose estimation from RGBD data: a) cuboid and non-transparent, b) non-cuboid and non-transparent, c) transparent. (Right) An arrangement of the shelf 
with the APC objects. 


annotated RGBD images. Objects in these images are shown 
in clutter from a variety of viewpoints. Because of the size, 
setting, and focus on 6D pose estimation, this dataset is the 
most closely related to the current paper. 

The dataset proposed here presents more than 10,000 ground 
truth annotated RGBD images of 24 objects of different types. 
As opposed to prior datasets 113, oni, 191 , this new dataset is 
specifically aimed at perception for robotic grasping and hence 
features full 6DOF ground truth poses for all 2.5D images. 
While some existing datasets IT^ . ITSl provide ground truth 
poses for objects in cluttered space, the new one additionally 
controls for clutter by presenting poses of the objects both 
with and without clutter. Other controls employed in data 
collection correspond to multiple viewpoints and collection 
of additional frames for control of slight noise in sensors. 
Additionally, scenes are not reconstructed as in alternatives 
ca, but the dataset includes the transformation matrices 
between the camera location, stationary robotic base, and 
object location. This allows users of the dataset to reconstruct 
the scene to suit their own methods. Lastly, this new dataset 
is specifically designed for warehouse perception task and is 
focused on the placement of objects in narrow spaces, such as 
shelf bins. To the best of the authors’ knowledge, this is the 
first attempt to generate a real-world dataset for this important 
application. 

III. Rutgers APC RGBD Dataset 

This paper presents a large 2.5D dataset consisting of 
lOk-F images and corresponding ground truth 6DOF poses 
for all these images, which is made available to the research 
community The focus is on supporting warehouse pick- 
and-place tasks. The accompanying software allows the easy 
evaluation of object detection and pose estimation algorithms 
in this context. 

A. Objects and 3D-mesh Models 

The selected objects correspond to those that were used 
during the first Amazon Picking Challenge (APC) (H, which 
took place in Seattle during May 2015. 

The provided dataset comes together with 3D-mesh object 
models for each of the APC competition objects. For most 

^It can be accessed online at the following uri: http://www.pracsyslab.org/ 
rutgers_apc_rgbd_dataset 



Eig. 3. The data collection setup for the warehouse pick-and-place dataset: 
A Motoman SDAIOF robot and an Amazon-Kiva Pod stocked with objects. 
At this configuration, the Kinect sensor mounted on the arm is used to detect 
an object at the bottom row of shelf bins. 

objects, the CAD 3D scale models of the objects were tex¬ 
tured using the open-source MeshLab software. For simple 
geometric shapes, such as cuboids, this simple combination 
of CAD modeling and texturing is sufficient and can yield 
results of similar quality to more involved techniques ifT^ . For 
objects with non-uniform geometries, models were produced 
using 3D photogrammetric reconstruction from multiple views 
of a monocular camera. 

B. Dataset Design 

The intention with the dataset was to provide to the commu¬ 
nity a large scale representation of the problem of 6DOF pose 
estimation using current 2.5D RGBD technology in a cluttered 
warehouse shelf environment. The set of 25 APC objects that 
were part of the competition exhibit a variety of features in 
the problem domain, including different size, shape, texture 
and transparency. 

C. Extent of the Dataset 

Data collection was performed using a Microsoft Kinect vl 
2.5D RGBD camera mounted to the end joint of a Motoman 
Dual-arm SDAIOF robot (Figure [^. Changing the intensity of 
the structured light in the Kinect driver allowed operation at 
closer distances. 

To provide better coverage of the scene and the ability to 
perform pose estimation from multiple vantage points, data 
from 3 separate positions (referred to, here, as “mapping” 
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positions) were collected: i) One directly in front of the center 
of a bin at a distance of 48cm, ii) a second roughly 10cm to 
the left of the first position, and iii) a third with the same 
distance to the right of the first position. Four 2.5D images 
were collected at each mapping position to account for noise. 

To measure the effects of clutter, for each object-pose 
combination, images were collected: (1) with only the object 
of interest occupying the bin, (2) with a single additional item 
of clutter within the bin, and (3) with two additional items 
of clutter. In all, the dataset can be broken down into the 
following parameters: 

• 24 Objects of interesj^ 

• 12 Bin locations per object 

• 3 Clutter states per bin 

• 3 Mapping positions per clutter state 

• 4 Frames per mapping position 

Considering all these parameters, the dataset is composed 
of a total of 10,368 2.5D images. For each image, there is 
a YAML file available containing the transformation matrices 
(rotation, translation) between: (1) the base of the robot and 
the camera, (2) the camera and the ground truth pose of the 
object, and (3) the base of the robot and ground truth pose of 
the object. 

The process for generating the ground truth data involved 
iterating over all the frames of the Rutgers APC dataset in 
a semi-manual manner. A human annotator translated and 
rotated the 3D model of the object in the corresponding RGBD 
point cloud scene using RViz. Every annotation superimposes 
the model to the corresponding portion of the point cloud|^ 

IV. Opportunities for Pose Estimation 
Improvements Through the Dataset 

This paper employs a setup similar to that of the Amazon 
Picking Challenge (APC) to evaluate the proposed dataset, 
which is a helpful testing ground for robotic perception 
algorithms in a relatively controlled but realistic warehouse 
environment. The available software infrastructure for using 
the dataset allows the incorporation of different algorithms for 
this problem, given the rich literature on the subject El, El, 

El, GqI, El, El, El, HSl, El, El, El, El, El- 

The current paper utilizes one such approach that is easily 
accessible to the robotics community and corresponds to the 
LINEMOD algorithm lO, for which an implementation based 
on the OpenCV library (H is available. 

LINEMOD is an object detection and pose estimation 
pipeline m, which received as input a 3D mesh object model. 
From the model, various viewpoints and features from multiple 
modalities (RGB gradients, surface normals) are sampled. 
The features are filtered to a robust set and stored as a 
template for the object and the given viewpoint. This process 
is repeated until sufficient coverage of the object is reached 
from different viewpoints. The detection process implements 

"^The “mead_index_cards” item from the APC list is not included as this 
simplihed the experimental process for collecting the data and it was the item 
that exhibited the most redundant qualities. 

^Additional details regarding naming conventions for the dataset and 
instructions for download and use can be found at the project’s website: 
http://www.pracsyslab.org/rutgers_apc_rgbd_dataset 


a template matching algorithm followed by several post¬ 
processing steps to refine the pose estimate. The approach 
was designed specifically for texture-less objects, which are 
notoriously challenging for pose estimation methods based 
on color and texture. LINEMOD uses surface normals in the 
template matching algorithm and limits RGB gradient features 
to the object’s silhouette. 

Starting with the baseline open source implementation of 
LINEMOD, the paper shows the incremental performance im¬ 
provements achieved over the basic implementation algorithm 
through the use of the Rutgers APC dataset. Most of the 
improvements are algorithm-agnostic and can be useful in 
general to warehouse detection and pose estimation tasks. 

A. Masking 

In the context of the APC, the bin of the shelf from which 
the object is detected and grasped is specified. In order to 
take advantage of such information, precise calibration of 
the shelf’s location with respect to the robot is performed 
prior to detection. Using ROS’ TF functionality 1301 it is 
possible to compute the boundary of the current bin of 
interest. {\XjYiiji^XjYiax\-)\yrnin-)yrnax\-)\^rnin-) ^rnax\)- Then, all 
points Pi = (pfare masked if 

{jmax ^ Pi ^ jmin) == 0 

je{x,y,z) 

B. Post-processing 

Dynamically selecting a viable detection threshold value 
allows the algorithm to always allow a fixed number of 
detections through and increases the positive detection rate in 
the context of warehouse pick-and-place since it is known that 
the object is present in the scene. Pose detection is made more 
accurate by dividing the RGB image into four quadrants and 
doing a hue-saturation histogram comparison for individual 
quadrants. The method is similar to an existing one El with 
the addition of quadrant processing, which aids in predicting 
the correct orientation of the object. 

C. Temporal smoothing 

A single query for object detection operates over a single 
frame of RGBD data from a single perspective. Capturing 
multiple frames from the same perspective helps in mitigating 
effects of noisy sensor data or inconsistent pose estimates. In 
the implementation of the temporal smoothing enhancement, 
12 frames of RGBD images are aggregated, with the most 
frequently reported pose estimate reported on the final frame. 
Effectively, for all Pi G Ppose_estimations' 

argmax((3(Pi) + ^ Q{Pj) /dist{Pi, Pj)) 

V Pjeneigh(Pi) 

where Q{Pi) is a quality measure of pose estimation Pi, 
dist{Pi,Pj) is a distance function between poses, and 
neigh{Pi) returns all other pose estimates within a small 
neighborhood of Pi. 

For objects where the likelihood of getting a good detection 
is low, temporal smoothing might bias the final detection to¬ 
wards bad pose estimations. Nevertheless, in this environment 
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Fig. 4. Scatter plots of raw pose estimation accuracy results for three example 
objects from the APC dataset. X-axis is translational error (L2 dist) in meters, 
Y-axis is rotational error in degrees. 
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Fig. 5. Scatter plots of raw pose estimation accuracy results for three example 
objects from the LINEMOD dataset. Axes measure the same dimensions as 
the plots to the left. 


the positive effects of smoothing pose estimates outweigh the 
negatives on average. 

V. Features and Comparison 

One of the defining characteristics of this dataset is the 
amount of control put into isolating certain environmental 
factors in its collection. To exemplify the importance of these 
controls, an analysis over the example LINEMOD algorithm 
is performed over both the proposed dataset and the original 
LINEMOD dataset. 

A. Effects of Clutter 

A situation known to cause difficulty for pose estimation 
algorithms, including state-of-the-art solutions, corresponds 
to the presence of significant clutter present in the scene 
containing the target object. For example, in a scene containing 
only the target object, simple segmentation techniques may 


provide reliable results. Adding, however, even a small number 
of other objects with vaguely similar colors or other visual 
features can easily cause simple approaches to fail. As such, 
it is a high priority goal for current solutions to 6D pose 
estimation problems to be as robust as possible to the presence 
of clutter. 

To elaborate on the description of the control for clutter, for 
every distinct target object pose across each of the 12 bins, the 
dataset provides frames (i) with the object alone in the bin, (ii) 
accompanied by a single clutter item, and (iii) accompanied 
by two clutter items. By doing so, users of the dataset can 
directly compare the accuracy of different algorithms under 
these three different conditions. 

Though the LINEMOD dataset shares similarities with the 
one proposed here (e.g., in terms of providing 6D ground truth, 
a variety of scenes and poses, and environments containing lots 
of clutter), it does not provide insights regarding the effects of 
clutter. In the comparison provided in this paper on Figures 
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and[^ these insights are exemplified. In the graph from Figure 
1^ which corresponds to the proposed dataset, a majority of the 
inaccurate pose estimates arise from scenes containing more 
clutter, but the overall effect is relatively small. In the plot from 
Figure corresponding to the LINEMOD dataset, inaccurate 
pose estimates occur across all variations in clutter and are 
dominated by the translation error in cluttered scenes. This is 
a likely indication that the detection algorithm is confusing 
other objects for the object of interest and thus indicates a 
weaker detection strength for this target object. Specifically 
regarding the middle graph of Figure there is a cluster of 
inaccurate pose estimates at the 90-degree rotational error rate. 
Due to the cuboid shape of the target object, it is likely that the 
algorithm often estimates an incorrect orientation of the object. 
Made possible by the controls placed in the data collection of 
the proposed dataset, these types of observations are valuable 
when making improvements to pose estimation algorithms or 
when comparing different approaches to suit a specific task. 
Additionally, given the inconsistent accuracy observed when 
evaluating the LINEMOD dataset, these insights would be 
extremely useful in this case. 

B. Coverage and Variety of Poses 

Another control prioritized when assembling the proposed 
dataset was to ensure a variety of ground truth poses for 
each object. At the same time, there was also the objective to 
choose poses such that they were representative of probable 
placements of objects in the APC competition. As such, 
all target placements are located several centimeters away 
from the front edge of the bin. The utility of the coverage 
characteristic is in allowing users to test their solutions when 
each of the major faces of the object is the primary viewable 
face. This allows to immediately identify troublesome surfaces 
and, given these insights, to design more robust solutions. 
Figure illustrates this control for one example object. 

C. Viewpoint Variety 

Among the additional features of the proposed dataset is 
the accumulation of samples from several vantage points for 
each target object pose and clutter combination. Specifically, 
samples were collected for each configuration from three 
different viewpoints in front of the shelf: (i) left of, (ii) 
centered in front of, and (iii) right of the bin. Since the left and 
right positions may incur some level of occlusion of the target 
object by parts of the shelving unit, one of the applications 
of this feature is in the determination of the effects of these 
partial occlusions. Additionally, these samples can be used 
for pose hypothesis aggregation and smoothing, or for 3D 
reconstruction approaches. 

D. Noisy Sensing 

While Kinect vl is an inexpensive and widely available 
sensor, a major detriment in its use is the noise inherent in 
RGBD samples produced using this equipment. To counteract 
this, for each configuration and camera position, the dataset 
provides four samples taken over a period of several seconds 



Fig. 6. Simulated scene showing the variety of ground truth poses for one 
example object from the proposed dataset, as it is rotated through the 12 bins 
of the shelf. 

with all objects and hardware stationary. Similar to the above, 
this feature will allow users to easily determine which situa¬ 
tions and target objects are robust to this noise and which are 
not. 

E. Extensions 

In addition to the above controls, an inherent feature of the 
dataset is that it can be used not only for single-object pose 
estimation, but also for multi-object. Because all transforms 
from object to robotic base are stored in the ground truth pose 
files, users may easily extend this dataset to the multi-object 
case simply by reading in all ground truth poses of neighboring 
bins within the same “run”, or configuration, of data collection. 
Since within a single “run”, no item’s placement is changed, 
this is straightforward to do. And because the dataset is 
organized by these runs, the implementation is rather easy. 
This feature makes the dataset a good candidate for testing 
3D reconstruction techniques. 

VI. Discussion 

This work contributes a large hand-annotated RGBD dataset 
with 6DOF ground truth poses. The dataset is specifically 
designed to support advancing solutions for the problem of 
pose estimation in tight environments that appear in warehouse 
picking problems. The extent and structure of the dataset pro¬ 
vides flexibility to researchers and allows them to use the data 
to apply and evaluate pose estimation methods using a variety 
of different techniques. The dataset is not only large relative to 
alternatives but is also designed to allow evaluation of several 
additional factors that can affect pose estimation accuracy. 
The accompanying software allows for improvements that are 
agnostic to the pose detection algorithm. 

The evaluation of an easily available pose estimation al¬ 
gorithm to the robotics community over the proposed dataset 
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emphasizes the difficulties that RGBD-based solutions face 
when dealing with transparent and refiective surfaces |[32l . 
Cuboid objects also pose some difficulties for algorithms that 
are based primarily on RBG-D data but it was possible to deal 
with these issues through the improvements described in this 
work, which were tailored to fit the context of a warehouse 
environment and provide robustness. 

The current dataset does not focus on the case ofpartially 
occluded objects, where a pose estimation process may be used 
to evaluate the pose of both the occluding and the occluded 
objects so as to assist rearrangement manipulation algorithms 
(33), im. Such problems can be potentially benefited by 
the utilization of cloud computation in order to improve 
performance and deal with the inferent uncertainty in the pose 
estimation and manipulation processes (SSI. 

There is also an influx of new results in the area of machine 
learning that can potentially be applied for the problem of pose 
estimation for warehouse picking and it would be interesting to 
see the quality of such solutions given the available dataset. 
Similarly, more classical methods developed for monocular 
cameras that primarily depend upon color and texture may 
exhibit complementary behavior to the one displayed by the 
considered RGBD approach. Fusing such methods can also be 
another way of achieving solutions that operate robustly over 
a wide variety of object classes and environmental conditions. 
The dataset can be useful in the evaluation of solutions in the 
context of related applications, such as directly detecting a 
handle (SEIi or a grasp ED from point cloud data. 
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